2 research outputs found
Generative Adversarial Networks for Bitcoin Data Augmentation
In Bitcoin entity classification, results are strongly conditioned by the
ground-truth dataset, especially when applying supervised machine learning
approaches. However, these ground-truth datasets are frequently affected by
significant class imbalance as generally they contain much more information
regarding legal services (Exchange, Gambling), than regarding services that may
be related to illicit activities (Mixer, Service). Class imbalance increases
the complexity of applying machine learning techniques and reduces the quality
of classification results, especially for underrepresented, but critical
classes.
In this paper, we propose to address this problem by using Generative
Adversarial Networks (GANs) for Bitcoin data augmentation as GANs recently have
shown promising results in the domain of image classification. However, there
is no "one-fits-all" GAN solution that works for every scenario. In fact,
setting GAN training parameters is non-trivial and heavily affects the quality
of the generated synthetic data. We therefore evaluate how GAN parameters such
as the optimization function, the size of the dataset and the chosen batch size
affect GAN implementation for one underrepresented entity class (Mining Pool)
and demonstrate how a "good" GAN configuration can be obtained that achieves
high similarity between synthetically generated and real Bitcoin address data.
To the best of our knowledge, this is the first study presenting GANs as a
valid tool for generating synthetic address data for data augmentation in
Bitcoin entity classification.Comment: 8 pages, 5 figures, 4 table